Goto

Collaborating Authors

 doubly robust estimator




Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

Neural Information Processing Systems

Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior policy to evaluate and learn new policies, is crucial in applications where experimentation is limited such as medicine. We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties). In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. To circumvent this issue, we propose several new doubly robust estimators based on different kernelization approaches. We analyze the asymptotic mean-squared error of each of these under mild rate conditions for nuisance estimators. Specifically, we demonstrate how to obtain a rate that is independent of the horizon length.




Rescuing double robustness: safe estimation under complete misspecification

Testa, Lorenzo, Chiaromonte, Francesca, Roeder, Kathryn

arXiv.org Machine Learning

Double robustness is a major selling point of semiparametric and missing data methodology. Its virtues lie in protection against partial nuisance misspecification and asymptotic semiparametric efficiency under correct nuisance specification. However, in many applications, complete nuisance misspecification should be regarded as the norm (or at the very least the expected default), and thus doubly robust estimators may behave fragilely. In fact, it has been amply verified empirically that these estimators can perform poorly when all nuisance functions are misspecified. Here, we first characterize this phenomenon of double fragility, and then propose a solution based on adaptive correction clipping (ACC). We argue that our ACC proposal is safe, in that it inherits the favorable properties of doubly robust estimators under correct nuisance specification, but its error is guaranteed to be bounded by a convex combination of the individual nuisance model errors, which prevents the instability caused by the compounding product of errors of doubly robust estimators. We also show that our proposal provides valid inference through the parametric bootstrap when nuisances are well-specified. We showcase the efficacy of our ACC estimator both through extensive simulations and by applying it to the analysis of Alzheimer's disease proteomics data.


Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety

Dai, Chi-Shian, Ying, Chao, Ning, Yang, Zhao, Jiwei

arXiv.org Machine Learning

Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure can be applied, requiring either the propensity score model or the control outcome model to be correctly specified. In this paper, we address scenarios where external control data, often with a much larger sample size, are available. Such data are typically easier to obtain from historical records or third-party sources. However, we find that incorporating external controls into the standard doubly robust estimator for ATT may paradoxically result in reduced efficiency compared to using the estimator without external controls. This counterintuitive outcome suggests that the naive incorporation of external controls could be detrimental to estimation efficiency. To resolve this issue, we propose a novel doubly robust estimator that guarantees higher efficiency than the standard approach without external controls, even under model misspecification. When all models are correctly specified, this estimator aligns with the standard doubly robust estimator that incorporates external controls and achieves semiparametric efficiency. The asymptotic theory developed in this work applies to high-dimensional confounder settings, which are increasingly common with the growing prevalence of electronic health record data. We demonstrate the effectiveness of our methodology through extensive simulation studies and a real-world data application.



Path-specific effects for pulse-oximetry guided decisions in critical care

Zhang, Kevin, Jung, Yonghan, Mahajan, Divyat, Shanmugam, Karthikeyan, Joshi, Shalmali

arXiv.org Machine Learning

Identifying and measuring biases associated with sensitive attributes is a crucial consideration in healthcare to prevent treatment disparities. One prominent issue is inaccurate pulse oximeter readings, which tend to overestimate oxygen saturation for dark-skinned patients and misrepresent supplemental oxygen needs. Most existing research has revealed statistical disparities linking device errors to patient outcomes in intensive care units (ICUs) without causal formalization. In contrast, this study causally investigates how racial discrepancies in oximetry measurements affect invasive ventilation in ICU settings. We employ a causal inference-based approach using path-specific effects to isolate the impact of bias by race on clinical decision-making. To estimate these effects, we leverage a doubly robust estimator, propose its self-normalized variant for improved sample efficiency, and provide novel finite-sample guarantees. Our methodology is validated on semi-synthetic data and applied to two large real-world health datasets: MIMIC-IV and eICU. Contrary to prior work, our analysis reveals minimal impact of racial discrepancies on invasive ventilation rates. However, path-specific effects mediated by oxygen saturation disparity are more pronounced on ventilation duration, and the severity differs by dataset. Our work provides a novel and practical pipeline for investigating potential disparities in the ICU and, more crucially, highlights the necessity of causal methods to robustly assess fairness in decision-making.


Causal LLM Routing: End-to-End Regret Minimization from Observational Data

Tsiourvas, Asterios, Sun, Wei, Perakis, Georgia

arXiv.org Machine Learning

LLM routing aims to select the most appropriate model for each query, balancing competing performance metrics such as accuracy and cost across a pool of language models. Prior approaches typically adopt a decoupled strategy, where the metrics are first predicted and the model is then selected based on these estimates. This setup is prone to compounding errors and often relies on full-feedback data, where each query is evaluated by all candidate models, which is costly to obtain and maintain in practice. In contrast, we learn from observational data, which records only the outcome of the model actually deployed. We propose a causal end-to-end framework that learns routing policies by minimizing decision-making regret from observational data. To enable efficient optimization, we introduce two theoretically grounded surrogate objectives: a classification-based upper bound, and a softmax-weighted regret approximation shown to recover the optimal policy at convergence. We further extend our framework to handle heterogeneous cost preferences via an interval-conditioned architecture. Experiments on public benchmarks show that our method outperforms existing baselines, achieving state-of-the-art performance across different embedding models.